Correlation processing for motion estimation

ABSTRACT

Motion correlation is a way of analyzing movement in image sequences such as television signals. The estimation of movement in television pictures is essential to enable the use of motion compensated processing techniques. These techniques yield improved quality video processing and moving image compression. Motion correlation would be used as part of a video motion estimation system. This invention describes a number of enhancements to, and a simplfied, efficient implementation of, the basic motion correlation algorithm. It is shown how to produce motion correlation surfaces co-timed with output, rather than input, pictures. It is also shown how to average power spectra to obtain improved noise performance, and hence improved accuracy, in motion correlation. Finally it is shown that separate correlation surfaces can be produced relating to `future` and `past` pictures.

This Application is the U.S. national phase application of PCTinternational application number PCT/EP96/03056.

The invention relates to video signal processing and to an improvedmethod of motion correlation that can be used as part of a system tomeasure movement in television pictures (references 1, 3, 6, 8, 19 & 21in the appendix). Motion estimation, in television pictures, isimportant because it allows a range of signal processing techniques tobe used that give improved performance (references2,3,4,5,10,11,12,13,14,15,16,18, 19 & 20.

Television signals originated by a television camera are,conventionally, interlaced. Television signals originated on film arenot, fundamentally, interlaced although they are formatted on aninterlaced lattice. There is potential confusion in the use of the terms`field` and `frame` used to describe television systems. To avoid thisconfusion the term `picture` is used throughout and can be taken to meaneither field or frame depending on the application.

One way in which motion estimation can be performed is as a two stageprocess (references 3, 4, 13, 14, 15, 16, 18 & 19). First a moving sceneis analysed to determine what movements it contains. This first stage ofthe analysis would produce a list of several different motions that maybe present in the scene. This list might contain, for example, thedistinct motion of foreground and background objects. Each of the motionvectors, produced by the first stage of analysis, is then tested todetermine whether it applies to any given pixel. The first stage ofanalysis is non-localised and it is the second stage of analysis thatlocates the spatial position of the different movements. This inventionis concerned with the first stage of analysis.

One, conventional, way in which movement in image sequences can beanalysed is by the use of cross-correlation. Cross-correlation isperformed on two successive images in a sequence. The cross-correlationfunction is expected to have peaks at positions corresponding todisplacements between the two images. With distinct foreground andbackground objects in an image the cross-correlation between successiveimages would be expected to give two peaks corresponding to the twodifferent movements. Unfortunately the shapes of the peaks in thecross-correlation surface depend strongly on the (2 dimensional)spectrum of the image. Since the energy in image spectra is typicallyconcentrated at low frequencies the peaks in cross-correlation surfacesare, correspondingly, typically rounded and indistinct. The roundedshape of typical cross correlation peaks makes determining the positionof the centre of the peak very difficult. Therefore, motion analysisusing cross correlation is very inaccurate.

Phase correlation has been used as an improved method of motion analysis(reference 4, 18 & 19). The phase correlation function is similar tocross correlation. The phase correlation function of two successiveimages in a sequence would also be expected to exhibit peaks inpositions corresponding to movements in the image. Phase correlation, incontrast to cross correlation, uses normalised, or `whitened` spectraprior to correlation. This gives much sharper peaks in the phasecorrelation surface for most images. The sharp peaks enable thedisplacement between two successive images to be accurately measured.

Motion correlation provides an analysis of the motion in an imagesequence based on its three dimensional, spatio-temporal spectrum. Themotion correlation algorithm is defined as follows. Let the brightness(or similar function of the image) be represented by g(x,y,t); where x,y & t represent the horizontal, vertical and temporal co-ordinates ofthe image sequence respectively.

1. Estimate the complex, 3 dimensional, image spectrum G(m,n,f) where m,n & f are horizontal, vertical and temporal frequencies respectively andF represents the Fourier transform operation.

    G(m,n,f)=F(g(x,y,t))                                       Equation 1

2. Normalise the spatio-temporal power spectrum by dividing it by thespatial power spectrum D(m,n); * represents complex conjugate. ##EQU1##where; D(m,n)=∫G(m,n,f)·G*(m,n,f)df 3. Re-transform the normalisedspatio-temporal power spectrum, N (m,n, f), to the spatio-temporaldomain; F⁻¹ represents the inverse Fourier transform operation.

    s(x,y,t)=F.sup.-1 (N(m,n,f))                               Equation 3

4. Sample the normalised image, s(x,y,t), at time t_(c) to give a motioncorrelation function, c(x,y). Note that t_(c) can be a positive ornegative (not zero) integer number of picture periods.

    c(x,y)=S(x,y,t.sub.c)                                      Equation 4

For a raster scanned television image the co-ordinates would bediscrete, referring to discrete pixels, lines or pictures, and Fouriertransforms would be implemented using the discrete Fourier transform.Note that the original algorithm has a square root in equation 2. Thisseems an unnecessary and undesirable complication that is ignored inthis document.

Motion correlation is intended to remove information, about the objectsin a scene, from the 3-D spectrum of an image sequence, leaving onlymotion information. Once the information, such as the position and shapeof the edges within an object, is removed the spectrum of a `standard`moving object remains. The `standard` object is a bright point movingthrough the spatial origin at time zero. Re-transforming the spectrum ofthis `standard` point object (or objects), to the space and time domain,reveals its (their) motion, from which the motion of the originalobject(s) can be inferred.

Motion correlation is most simply explained by considering a singlespatial dimension, as illustrated in FIG. 1. The (2 dimensional) spatialspectrum of an image is an alternative representation of the image.Essentially the phase part of the spatial spectrum describes thepositions of edges in the image whilst the magnitude part (spatial powerspectrum) describes the profile of the edges. The (3 dimensional)spatio-temporal spectrum of the image sequence contains motioninformation as well. In a stationary scene all the energy, in thespatio-temporal spectrum, lies at zero temporal frequency. For a movingobject the energy is skewed to higher temporal frequencies as shown inFIG. 1. In motion correlation, first the phase information is removedfrom the spatio-temporal spectrum by calculating the spatio-temporalpower spectrum: (G(m,n,t)·G*(m,n,t) in equation 2). This removesinformation on the position of edges in the image. Then the spatialpower spectrum (D(m,n) in equation 2) is calculated by summing theenergy for all temporal frequencies. The spatio-temporal power spectrumis divided by the spatial power spectrum to give the normalisedspatio-temporal power spectrum (equation 2). This division removesinformation on edge profiles in the image. All that is now left in thenormalised spatio-temporal power spectrum is motion information. Themotion information is recovered from the normalised spatio-temporalpower spectrum by re-transforming it to the spatio-temporal domain(equation 3). This gives a moving bright point object, which passesthrough the spatial origin at time zero. The velocity of the pointobject corresponds to the velocity of the object in the original imagesequence. The velocity is found by locating the point object at time tc,knowing that it passed through the origin at time zero. That is, byfinding the peak or peaks in the correlation surface the originalmotion(s) can be inferred.

Motion correlation has a number of advantages over other correlationtechniques for motion analysis of image sequences. The advantagesinclude measuring motion rather than displacement, improved noiseperformance, improved measurement of multiple velocities andinsensitivity to prior temporal filtering of the image sequence.

Motion correlation analysis of a sequence of images produces ameasurement of velocity, not just displacement. Other correlationtechniques measure displacement not velocity. If only two images areused in the analysis it is possible only to measure displacement, as isthe case for both cross- and phase-correlation. If the image sequencecontains consistent motion then measuring displacement also measuresvelocity The assumption of consistent motion, implicit in othercorrelation techniques, is not always true. Consider, for example, theimage of two, stationary, snooker balls on a snooker table. Since theballs look identical, both cross and phase correlation will measure thedisplacement between the balls as well as the true zero velocity. Motioncorrelation, by contrast, will only measure the true zero velocitybecause the other `motion` is not consistent across the sequence ofimages.

Motion correlation exhibits better noise performance than phasecorrelation, particularly at low velocities. This enables the peaks inmotion correlation surfaces to be located more precisely and hencemotion to be measured more accurately. The improved noise performancecomes from the use of more input pictures in the analysis and the way inwhich spectral normalisation is performed.

Motion correlation should have improved ability to measure multiplemotions in the image sequence. Multiple motions commonly arise, forexample, from the independent motion of foreground and backgroundobjects in a scene. With cross and phase correlation techniques multiplemovements are confused if the spatial power spectra of the two objectsoverlap. Independent measurement of foreground and background motionrequires, at least partially, non-overlapping spectra with these othertechniques. This may occur if, for example, the background containsmainly low frequencies and the foreground mainly high frequencies, butis, by no means, guaranteed. Motion correlation, by contrast, can, inprinciple, measure two distinct motions even if the spatial spectra ofthe objects completely overlap. This additional discrimination resultsfrom using more images in the analysis.

Since motion correlation measures velocity rather than displacement itis insensitive to prior temporal filtering of the image sequence. Crossand phase correlation techniques, by contrast, are easily confused bysuch temporal filtering. Temporal filtering may arise in the context oftelevision systems from, for example, the use of field or frame combdecoders for decoding composite (PAL or NTSC) colour signals. Temporalfiltering of a moving image sequence produces multiple images in thefiltered sequence. Cross or phase correlation measures the displacementbetween each of the multiple images in each input image. This results inmultiple, erroneous, motions being detected by these other correlationtechniques, but not by motion correlation. Consider the example of animage, panning with a velocity (v) and subject to a temporal filter witha three picture aperture. The filtered output image, at time t_(c),contains contributions from input pictures at times t₋₁, t₀ & t₁.Similarly the output image at t₁ contains contributions from inputpictures at t₀, t₁ & t₂. Cross or phase correlating these two filteredpictures results in correlation peaks corresponding to velocities -v, 0,v, 2 v, 3 v. Four of these five measured velocities are erroneous.Motion correlation is not subject to this problem because each of themultiple images in the filtered sequence is moving with the samevelocity.

Motion correlation has been proposed as a method of motion analysis inimage sequences (reference 3). It has a number of significant advantagesover other techniques as described above, however, the directimplementation of the motion correlation algorithm is extremelycomputationally intensive. It is an object of the present invention toovercome this problem and to provide an efficient technique for motionanalysis that is a significant improvement over existing techniques.

The invention provides a method of processing a plurality of consecutivepictures from a video signal for a motion estimation technique wherein amotion correlation function can be generated corresponding to anarbitrary time instant. In particular, the correlation function producedcan be generated co-timed with output, rather than input pictures. Thisenables motion estimation to be more accurate when motion vectors arerequired on a different standard to the input image.

The invention also provides a method of processing a video signalcomprising, for each picture, calculating complex spatio-temporalspectra of different spatial regions of a picture using a 3-D discretefourier transform (DFT) including a temporal window function, performinga modulus squared operation to find the spatio-temporal power spectra,averaging the spatio-temporal power spectra spatially across differentpicture regions, normalising the average spatio-temporal power spectrumusing the spatial power spectrum derived from the averagedspatio-temporal power spectra as the normalisation factor,re-transforming the normalised spatio-temporal power spectrum into thespatio-temporal domain using an inverse DFT, and temporally sub-samplingto produce a motion correlation output. The step of averaging thespatio-temporal power spectra is preferably performed on adjacentcorrelation regions. This gives an improved estimate of thespatio-temporal power spectrum.

The invention also provides a method of processing video signals formotion estimation comprising, for each picture, calculating the complexspatio-temporal spectra of a signal by first calculating the complexspatial spectrum of each region of the picture using a spatial discretefourier transform (DFT) and subjecting the result to a temporal DFTincluding a temporal window function, averaging spatio-temporal powerspectra determined from the complex spatio-temporal power spectra acrossdifferent picture regions, normalising the average power spectrum,re-transforming the normalised spatio-temporal power spectrum into thespatio-temporal domain using an inverse temporal DFT and an inversespatial DFT, and performing temporal sub-sampling between the twoinverse DFTs.

The step of averaging the spatio-temporal power spectra may beimplemented using, for example, transversal filters. That is, aninter-region spatial filter.

The normalisation step may use the spatial power spectra averaged acrossdifferent picture regions as the normalisation factor where the spatialpower spectrum is calculated in the temporal frequency domain.Alternatively, the spatial power spectra are calculated in the timedomain and averaged across both different picture regions and oversuccessive pictures to produce the normalisation factor.

The invention further provides a method of processing video signals formotion estimation comprising, for each picture, calculating the complexspatial spectra of a signal using a spatial DFT, normalising the complexspatial spectra using the spatial power spectra averaged across bothdifferent picture regions and successive pictures as the normalisationfactor, calculating the normalised complex spatio-temporal spectra usinga temporal DFT, averaging the normalised spatio-temporal power spectraacross different picture regions, re-transforming the averagednormalised spatio-temporal power spectrum into the spatio-temporaldomain by performing an inverse temporal DFT and an inverse spatial DFT,and sub-sampling the spatio-temporal power spectrum between the twoinverse DFTs.

The spatial power spectra and spatio-temporal power spectra may bespatially averaged using transversal filters such as an inter-regionspatial filter. The step of temporal averaging over successive picturesmay be performed using a temporal transversal filter. Furthermore, theinverse temporal DFT may be performed on the normalised spatio-temporalpower spectrum and the inter-region spatial filter may operate after thetemporal sub-sampling and before the spatial DFT.

The forward temporal DFT preferably includes a time varying windowfunction.

The invention also provides a method of processing video signals formotion estimation wherein the steps inclusive of the temporal DFT andthe inverse temporal DFT are replaced by a temporal auto-correlationoperation. The auto-correlation operation also replaces the temporalsub-sampling step. As the result produced by the auto-correlationfunction is only a single (temporal) value, the computational complexityis reduced. When the auto-correlation is implemented in the time domain,the time-varying window function of the substituted temporal DFT isreplaced by a time varying interpolation filter.

The invention further provides a method of processing video signalscomprising, for each picture, calculating the normalised complex spatialspectra for different regions of a picture, performing a temporalauto-correlation on the normalised spectrum and temporally filtering toproduce a motion correlation output. The step of temporally filteringmay be implemented using two separate filters one having only futurepictures in its aperture the other having only past pictures in itsaperture.

In the above methods, the discrete fourier transforms may be implementedusing a fast fourier transform (FFT) algorithm.

The invention further provides apparatus for performing the method ofthe invention.

The invention will now be described in more detail and with reference tothe accompanying drawings, in which:

FIG. 1 shows graphically the spatio-temporal spectra of stationary andmoving objects;

FIG. 2 shows two window functions centred differently with respect toinput pictures;

FIG. 3 shows the basic implementation of motion correlation in which thepower spectra from adjacent correlation regions is averaged;

FIGS. 4 to 12 show alternative implementations of motion correlationaccording to the invention.

The basic motion correlation algorithm (described in reference 3) can beenhanced in two ways. The error in estimating the spatio-temporal powerspectrum of the input image sequence can be reduced by averaging severalpower spectra in a technique analogous to that of periodogram averagingin one dimensional spectral estimation (reference 9). By reducing theerror, in the estimated spatio-temporal power spectrum, the accuracy ofthe motion analysis can be improved. For some sorts of video processing,notably standards conversion, it is useful to be able to analyse theimage sequence at arbitrary time instants, unrelated to times at whichinput pictures are sampled. Motion correlation can be modified so that amotion correlation function can be generated corresponding to anarbitrary time instant. This is not possible, by contrast, for cross orphase correlation.

Correlation analysis is often performed on parts of an image rather thanthe whole image. This is done to prevent the correlation being confusedby the presence of too many different movements. Typically aconventional television image might be subdivided into about 100(roughly) square regions. These regions may simply be juxtaposed tocover the entire image. The regions may also overlap so that each pixelin the original image appears in multiple correlation regions. Typicallythe correlation regions might overlap so that about half the pixels werecommon to adjacent regions horizontally and vertically. With this 2:1overlap, both horizontally and vertically, each input pixel would appearin 4 different correlation regions.

The spatio-temporal power spectrum may be estimated by directapplication of the discrete Fourier transform (as described in reference9). It is well known that power spectral estimates derived in this waytend to be noisy (reference 9). In one dimensional spectral analysis theproblem of noisy estimates is addressed by averaging several independentpower spectra to reduce the noise. This is known as `periodogramaveraging`. This technique can be extended to reduce the noise inestimating the spatio-temporal power spectrum for motion correlation. Ifmultiple correlation regions are analysed, as described above, powerspectra from adjacent regions can be averaged to reduce noise. Typicallyone might average power spectra of the current correlation region andits eight immediately, and diagonally, adjacent regions. A weightedaverage of these regions might also be used with, for example, the fourimmediately adjacent regions being weighted half as much as the currentregion and the four diagonally adjacent regions weighted by one quarter.Reducing the noise in the spatio-temporal power spectrum will reduce thenoise in the motion correlation surface leading to more accurate motionanalysis.

The process of television standards conversion can be improved by usingmotion estimation. Standards conversion is the process of convertingbetween different television standards, particularly those withdifferent picture rates. The archetypal standards conversion is betweenEuropean television with a picture rate of 50 Hz and American televisionwith a picture rate of 60 Hz. This can be performed using motioncompensated interpolation as described in references 2,3,4,7,13,14,15 &16. For some methods of motion compensated interpolation (references3,4,13,14,15,16 & 18) it is necessary to sample motion vectors on theoutput standard. It is therefore advantageous to be able to analysemotion in the input picture sequence at temporal sampling instantscorresponding to output pictures. The output picture sampling instantsdo not, generally, correspond to input picture sampling instants and therelative timing varies with time.

In motion correlation, a sequence of input pictures are analysed so asto estimate motion. It is not reasonable to assume that the motion intelevision pictures remains constant for very long. One tenth of asecond is a reasonable period over which to assume constant motion andthis limits motion correlation to analysing only 5 or 6 conventionaltelevision pictures. With such a small number of input pictures toanalyse it is important to make best use of the available information.An improved estimate of the spatio-temporal power spectrum can beachieved by multiplying the input image sequence by a, carefullyselected, temporal window function. Window functions are well known inspectral estimation theory and are used to reduce bias and leakage inthe estimated power spectrum and to localise the spectrum in time.

The centre of the temporal window function, used in estimating thespatio-temporal power spectrum, determines the instant for which themotion correlation analysis is valid. FIG. 2 shows two window functionscentred differently with respect to the input pictures. In the upperwindow function the resulting motion analysis is valid coincident withan input picture. In the lower window function the analysis is valid midway between two input pictures. The precise instant for which motioncorrelation analysis is valid can be changed by varying the timing ofthe centre of the window function relative to the input pictures. Thisallows motion correlation analyses to be generated coincident withoutput, rather than input pictures in, for example, a standardsconverter. This, in turn, makes motion estimation more accurate wheremotion vectors are required on a different standard to the input images.For cross and phase correlation, by contrast, the timing of thecorrelation analysis is completely determined by the input picturetiming and cannot be varied.

Varying the position of the temporal analysis window allows motioncorrelation analyses to be produced corresponding to any instant oftime. The analysis, however, is still performed with respect to theinput sampling lattice. This means that measuring the position of a peakin the motion correlation function will give velocity in units of inputpixels (or picture lines) per input picture period.

It is noted that the spatial location of the motion correlation analysiscan also be varied by adjusting the spatial centre of a spatial windowfunction. This can be done for cross, phase and motion correlationanalyses but is probably of limited utility.

The most basic implementation of motion correlation is shown in FIG. 3.This implementation is computationally intensive and must operate onsignals with a wide dynamic range. In this form, using today'stechnology, it is only really practicable to implement it using floatingpoint arithmetic in a non-real time computer realisation (as describedin reference 3). For practical, real time, implementation it isdesirable both to simplify the algorithm and reduce the dynamic range ofthe signals being operated upon.

The input to the 3-D FFT, in FIG. 3, is, for example, a raster scannedvideo signal. The FFT includes a temporal (& spatial) window function.The temporal timing of the output motion correlation function can bevaried by adjusting the temporal window function as described above. Togenerate motion correlation functions co-timed with output pictures, ona different television standard, requires the use of output standardtiming signals to control positioning of the temporal window functionused in the FFT. The FFT produces a separate, 3-D, complex spectrum forevery output picture period; the output picture period may be the sameas the input picture period. Typically the spectral analysis might use,say, 6 pictures. The last 6 pictures are stored within the FFT unit. Forevery output picture period the FFT will generate the equivalent of 6pictures of spectral data. This is indicated as 6 signals out of the FFTunit.

Spectral analysis using 6 pictures is assumed for the rest of thisdocument. However this is not essential and different numbers ofpictures could be used in the analysis.

FIG. 3 implements equations 1 to 4 above and in addition the powerspectra from adjacent correlation regions are averaged to reduce noise,as described above. The forward FFT implements equation 1. The modulussquare, integrator and divider implement equation 2. The spatial powerspectrum output from the integrator is D(m,n) in equation 2. Integrationover frequency is achieved by simply summing the available samples ofthe spatio-temporal power spectrum for all frequency co-ordinates. Theinverse FFT implements equation 3. Equation 4 is implemented by temporalsub-sampling, which simply discards most of the data.

A first step in simplifying the implementation of motion correlation isto pipeline the calculation of the three dimensional discrete fouriertransform as shown in FIG. 4. In FIG. 4 the results of calculating thespatial complex spectra, an intermediate step in calculating the 3-Dspectrum, are reused in the calculation of multiple 3-D spectra. Byretaining and reusing these results repeated, identical, calculationsare avoided, thereby improving efficiency. Further efficiencyimprovements are achieved by re-positioning temporal sub-samplingbetween inverse temporal and inverse spatial Fourier transforms. Thisavoids the calculation of many inverse spatial transforms which wouldsimply be discarded.

If correlation regions are juxtaposed in the original image (rather thanoverlapping) then the spectra of these regions can be juxtaposed in thesame fashion in the intermediate processed signals. These intermediateprocessed signals thus constitute a `patchwork` of spectra. With theintermediate spectra juxtaposed in this way, averaging power spectra,for noise reduction, can be achieved using a spatial transversal filter.To average adjacent correlation regions, rather than adjacentfrequencies with a single correlation region, the delays in thetransversal filter must be equal to the dimensions of the correlationregion, rather than single pixels or lines. With suitable obviousmodifications, averaging power spectra can be implemented withtransversal filters even if correlation regions overlap in the originalpicture sequence. With this technique in mind averaging spatio-temporalpower spectra is shown as a filtering operation in FIG. 4.

In FIGS. 3 and 4 the spatial power spectrum, used as a normalisingfactor, is calculated in the temporal frequency domain. Using Parseval'stheorem it is also possible to calculate the spatial power spectrum inthe time domain as illustrated in FIG. 5. Doing this simplifies theintegration, which can now be implemented as a temporal transversalfilter. Note that the filter used to integrate the `raw` (un-averaged)spatial power spectrum must also perform an interpolation between inputand output picture rates. Note also that FIG. 5 shows two spatialfilters which are respectively required to average the power spectrafrom adjacent correlation regions.

In equation 2 the spatial power spectrum, used as a normalising factor,is given by;

    D(m,n)=∫|G(m,n,f)|.sup.2 df         Equation 5

Using Parseval's theorem the spatial power spectrum is also given by;

    D(m,n)=∫|F.sub.f.sup.-1 (G(m,n,f))|.sup.2 dt Equation 6

where F_(f) ⁻¹ represents the inverse Fourier transform with respect totemporal frequency, f, only. This mathematical identity is implementedin FIG. 5. The implementation shown in FIG. 5 has an outputmathematically identical to the basic implementation of motioncorrelation in FIG. 3; it is simply a more convenient implementation.

FIG. 5 is difficult to implement because of the wide dynamic range ofthe complex spatial spectrum input to the temporal FFT. This difficultycan be removed by re-positioning the normalisation before the temporalFFT as illustrated in FIG. 6. This change also simplifies the temporalintegration because the temporal integrator is no longer required tointerpolate between input and output picture rates. Hence anon-interpolating transversal filter can be used to perform temporalintegration. This is combined, in FIG. 6, with the inter-region spatialfilter used to average power spectra from adjacent correlation regions.

The implementations shown in FIGS. 5 and 6 result in quantitativelydifferent motion correlation outputs. However, the temporal bandwidth ofthe (temporally averaged) spatial power spectrum is small because thetemporal integrator acts as a low pass filter. Thus the modifiedcorrelation surface produced by FIG. 6 is regarded as qualitativelysimilar to the motion correlation function defined in equations 1 to 4.

Having moved the point at which normalisation is performed in theprocessing chain, further simplifications are possible. The inter-regionspatial filter, used for averaging normalised spatio-temporal powerspectral in adjacent correlation regions, is a purely spatial operation.Therefore it is commutative with the purely temporal operations of thetemporal inverse FFT and temporal sub-sampling. Hence spatial filteringcan be performed after temporal sub-sampling, as shown in FIG. 7,without affecting the motion correlation output. This reducescomputational complexity because now the spatial filter is only requiredto operate on a single stream of data rather than the many in FIG. 6. Ifsix pictures are used in motion correlation analysis then this changesaves 5/6 of the hardware required for the inter-region spatial filter.

In FIG. 7 there are a forward Fourier transform, modulus squaredoperation and inverse Fourier transform next to each other. Thejuxtaposition of these three operations is equivalent to anauto-correlation. Furthermore these three operations produce thecomplete auto-correlation function. Motion correlation, however, onlyrequires a single sample of the auto-correlation function, which isselected by the sub-sampling operation. Hence all four operations inFIG. 7 can be replaced by a correlation operation that produces only asingle (temporal) value of the auto-correlation function, as shown inFIG. 8. This gives a significant reduction in computational complexity.

The auto-correlation in FIG. 7, implemented using FFTs, is actually acyclic auto-correlation. A cyclic correlation assumes a periodicextension of the signal being correlated. It is both more appropriateand easier to implement a true auto-correlation as shown in FIG. 9. Ineach stage of the process in FIG. 9. the normalised complex spatialspectrum is multiplied by the complex conjugate:of the precedingspectrum; where the * symbol, in FIG. 9, represents the complexconjugate operation. Note that, in principle, the auto-correlation cancontain an arbitrary number of stages; 6 stages are required to analyse6 input pictures. The results from each stage are summed to give theauto-correlation output. This modification results in a furthermodification of the motion correlation output. This modification isbelieved to be beneficial rather than detrimental.

A more convenient, but equivalent, implementation of theauto-correlation is shown in FIG. 10.

The first temporal Fourier transform, in FIG. 7, includes the temporalwindow function, which determines the time at which the motioncorrelation analysis applies. The position of the centre of the windowvaries with the relative timing of input and output pictures asdescribed above. When the auto-correlation is implemented in the timedomain, as illustrated in FIG. 8, the time-varying window function isreplaced by a time varying temporal interpolation filter, as shown inFIG. 10. In FIG. 10 the picture delay is an input picture delay. Thefilter averages the multiplier output over time to generate the requiredauto-correlation value. It also changes the temporal sampling rate ofthe auto-correlation value from the input picture rate to the outputrate.

Once motion correlation is implemented using time domainauto-correlation, as in FIG. 8, the method can be further modified byre-positioning filters in the signal processing chain. In FIG. 8 atemporal interpolation filter (implicit in FIG. 8 and explicit in FIG.10) and an inter-region spatial filter precede the spatial inverseFourier transform. These filters may be re-positioned after the Fouriertransform because all these operations are linear operations and, hence,commutative. This is shown in FIG. 11. Moving the filters results inreduced computational complexity since the filters now operate on real,rather than complex, signals. This reduces the filter complexity by,approximately, a factor of four.

FIG. 11 is a computationally less intensive implementation of motioncorrelation and results in a modification to the motion correlationoutput given by equations 1 to 4 due to the above describedmodifications.

The implementation of FIG. 11 leads to a further enhancement of themotion correlation method. In motion compensated image processing, towhich end this invention is directed, the treatment of revealed andconcealed background is important. Revealed background is only presentin `future` pictures, whereas concealed background is only present in`past` pictures (reference 17). If a motion estimation system isrequired to consider revealed background then it is useful if the motionanalysis considers only `future` pictures. Performing such a `forward`analysis ensures that the analysis is not confused by concealedbackground. Similarly a `backward` motion analysis, considering only`past` pictures, is useful for concealed background.

The simplified implementation of motion correlation, in FIG. 11, can bemodified to provide separate forward and backward motion correlationanalyses, as shown in FIG. 12. To do this the temporal interpolationfilter of FIG. 11 is split into two temporal halves. One half containsonly `future` pictures in its aperture and generates a forward motioncorrelation surface. The other half contains only `past` pictures in itsaperture and generates a backward motion correlation surface. The fullmotion correlation surface, as generated by FIG. 11, could be recoveredby simply summing the forward and backward correlation surfaces. Thismodification could be of considerable advantage in motion estimation forprocessing revealed and concealed background. Separate forward andbackward correlation surfaces might prove particularly advantageous whenprocessing a `cut` in a video sequence.

This invention is an enhanced motion correlation technique and anefficient method for its implementation. Motion correlation is atechnique for analysing the motion of one or more objects in an imagesequence. It might form part of a system for estimating motion intelevision pictures. Once television motion has been estimated themotion vectors can be used to implement improved video processingalgorithms.

This invention presents several enhancements to the original motioncorrelation algorithm published in reference 3. It is shown how toproduce motion correlation surfaces co-timed with output, rather thaninput, pictures. This might be particularly advantageous in a televisionstandards converter system. It is also shown how to average powerspectra to obtain improved noise performance, and hence improvedaccuracy, in motion correlation. Finally it is shown that separatecorrelation surfaces can be produced relating to `future` and `past`pictures. This may be helpful in processing areas of concealed andrevealed background in image sequences.

An efficient implementation of motion correlation is presented in FIGS.11 and 12. This implementation was developed, in a series of stages,from the basic implementation shown in FIG. 3. The basic implementationof FIG. 3 is only really practicable for non-real time implementationusing a floating point data format. In other words FIG. 3 is onlysuitable for use in a computer simulation. FIG. 11, by contrast canreadily be implemented in real time using easily available commercialsemiconductor technology. Minor modifications are required to the basicmotion correlation algorithm in order to effect this simplification.These changes do not result in a qualitative degradation in the outputcorrelation surfaces.

I claim:
 1. A method of processing a plurality of consecutive picturesfrom a video signal for a motion estimation technique comprising, foreach picture, calculating complex spatio-temporal spectra of differentspatial regions of a picture using a 3-D discrete Fourier transform(DFT) including a temporal window function, performing a modulus squaredoperation to find the spatio-temporal power spectra, averaging thespatio-temporal power spectra spatially across different pictureregions, normalizing the average spatio-temporal power spectrum usingthe spatial power spectrum derived from the averaged spatio-temporalpower spectra as the normalization factor, re-transforming thenormalized spatio-temporal power spectrum into the spatio-temporaldomain using an inverse DFT, and temporally sub-sampling to produce amotion correlation output, wherein a motion correlation function isgenerated corresponding to a predetermined time instant by applying saidtemporal window function in the calculation of the spatio-temporal powerspectrum.
 2. A method of processing a video signal as claimed in claim1, wherein the step of averaging the spatio-temporal power spectra ispreferably performed on adjacent correlation regions.
 3. A method ofprocessing a plurality of consecutive pictures from a video signal for amotion estimation technique comprising, for each picture, calculatingthe complex spatio-temporal spectra of a signal by first calculating thecomplex spatial spectrum of each region of the picture using a spatialDFT and subjecting the result to a temporal DFT including a temporalwindow function, averaging spatio-temporal power spectra determined fromthe complex spatio-temporal power spectra across different pictureregions, normalizing the average power spectrum, re-transforming thenormalized spatio-temporal power spectrum into the spatio-temporaldomain using an inverse temporal DFT and an inverse spatial DFT, andperforming temporal sub-sampling between the two inverse DFTs wherein amotion correlation function is generated corresponding to apredetermined time instant by applying said temporal window function inthe calculation of the spatio-temporal power spectrum.
 4. A method ofprocessing video signals as claimed in claim 3, wherein the step ofaveraging the spatio-temporal power spectra is implemented usingtransversal filters.
 5. A method of processing video signals as claimedin claim 3, wherein the spatial power spectrum is calculated in thetemporal frequency domain and the spatial power spectra averaged acrossdifferent picture regions is used as the normalization factor.
 6. Amethod of processing video signals as claimed in claim 3, wherein thespatial power spectra are calculated in the time domain and averagedacross both different picture regions and over successive pictures toproduce the normalization factor.
 7. A method of processing videosignals for motion estimation comprising, for each picture, calculatingthe complex spatial spectra of a signal using a spatial DFT, normalizingthe complex spatial spectra using the spatial power spectra averagedacross both different picture regions and successive pictures as thenormalization factor, calculating the normalized complex spatio-temporalspectra using a temporal DFT, averaging the normalized spatio-temporalpower spectra across different picture regions, re-transforming theaveraged normalized spatio-temporal power spectrum into thespatio-temporal domain by performing an inverse temporal DFT and aninverse spatial DFT, and sub-sampling the spatio-temporal power spectrumbetween the two inverse DFTs.
 8. A method of processing video signals asclaimed in claim 7, wherein the spatial power spectra andspatio-temporal power spectra are spatially averaged using transversalfilters.
 9. A method of processing video signals as claimed in claim 7,wherein the step of temporal averaging over successive pictures isperformed using a temporal transversal filter.
 10. A method ofprocessing video signals as clamed in claim 7, wherein the inversetemporal DFT is performed on the normalized spatio-temporal powerspectrum and the inter-region spatial filter operates after the temporalsub-sampling and before the spatial DFT.
 11. A method of processingvideo signals as claimed in claim 10, wherein the steps inclusive of thetemporal DFT and the inverse temporal DFT are replaced by a temporalauto-correlation operation.
 12. A method of processing video signals asclaimed in claim 11, wherein the auto-correlation operation alsoreplaces the temporal sub-sampling step.
 13. A method of processingvideo signals as claimed in claim 11, wherein the auto-correlation isimplemented in the time domain and the time-varying window function ofthe substituted temporal DFT is replaced by a time varying interpolationfilter.
 14. A method of processing video signals as claimed in claim 7,wherein the forward temporal DFT includes a time varying windowfunction.
 15. A method of processing video signals comprising, for eachpicture, calculating the normalized complex spatial spectra fordifferent regions of a picture, performing a temporal auto-correlationon the normalized spectrum, spatially filtering using an inter-regionspatial filter, and temporally filtering to produce a motion correlationoutput.
 16. A method of processing video signals as claimed in claim 15,wherein the step of temporally filtering is implemented using twoseparate filters one having only future pictures in its aperture theother having only past pictures in its aperture.